229 research outputs found
A Convex Model for Edge-Histogram Specification with Applications to Edge-preserving Smoothing
The goal of edge-histogram specification is to find an image whose edge image
has a histogram that matches a given edge-histogram as much as possible.
Mignotte has proposed a non-convex model for the problem [M. Mignotte. An
energy-based model for the image edge-histogram specification problem. IEEE
Transactions on Image Processing, 21(1):379--386, 2012]. In his work, edge
magnitudes of an input image are first modified by histogram specification to
match the given edge-histogram. Then, a non-convex model is minimized to find
an output image whose edge-histogram matches the modified edge-histogram. The
non-convexity of the model hinders the computations and the inclusion of useful
constraints such as the dynamic range constraint. In this paper, instead of
considering edge magnitudes, we directly consider the image gradients and
propose a convex model based on them. Furthermore, we include additional
constraints in our model based on different applications. The convexity of our
model allows us to compute the output image efficiently using either
Alternating Direction Method of Multipliers or Fast Iterative
Shrinkage-Thresholding Algorithm. We consider several applications in
edge-preserving smoothing including image abstraction, edge extraction, details
exaggeration, and documents scan-through removal. Numerical results are given
to illustrate that our method successfully produces decent results efficiently
Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Blind face restoration is a highly ill-posed problem that often requires
auxiliary guidance to 1) improve the mapping from degraded inputs to desired
outputs, or 2) complement high-quality details lost in the inputs. In this
paper, we demonstrate that a learned discrete codebook prior in a small proxy
space largely reduces the uncertainty and ambiguity of restoration mapping by
casting blind face restoration as a code prediction task, while providing rich
visual atoms for generating high-quality faces. Under this paradigm, we propose
a Transformer-based prediction network, named CodeFormer, to model the global
composition and context of the low-quality faces for code prediction, enabling
the discovery of natural faces that closely approximate the target faces even
when the inputs are severely degraded. To enhance the adaptiveness for
different degradation, we also propose a controllable feature transformation
module that allows a flexible trade-off between fidelity and quality. Thanks to
the expressive codebook prior and global modeling, CodeFormer outperforms the
state of the arts in both quality and fidelity, showing superior robustness to
degradation. Extensive experimental results on synthetic and real-world
datasets verify the effectiveness of our method.Comment: Accepted by NeurIPS 2022. Code: https://github.com/sczhou/CodeForme
Understanding Deformable Alignment in Video Super-Resolution
Deformable convolution, originally proposed for the adaptation to geometric
variations of objects, has recently shown compelling performance in aligning
multiple frames and is increasingly adopted for video super-resolution. Despite
its remarkable performance, its underlying mechanism for alignment remains
unclear. In this study, we carefully investigate the relation between
deformable alignment and the classic flow-based alignment. We show that
deformable convolution can be decomposed into a combination of spatial warping
and convolution. This decomposition reveals the commonality of deformable
alignment and flow-based alignment in formulation, but with a key difference in
their offset diversity. We further demonstrate through experiments that the
increased diversity in deformable alignment yields better-aligned features, and
hence significantly improves the quality of video super-resolution output.
Based on our observations, we propose an offset-fidelity loss that guides the
offset learning with optical flow. Experiments show that our loss successfully
avoids the overflow of offsets and alleviates the instability problem of
deformable alignment. Aside from the contributions to deformable alignment, our
formulation inspires a more flexible approach to introduce offset diversity to
flow-based alignment, improving its performance.Comment: Tech report, 15 pages, 19 figure
Dual Associated Encoder for Face Restoration
Restoring facial details from low-quality (LQ) images has remained a
challenging problem due to its ill-posedness induced by various degradations in
the wild. The existing codebook prior mitigates the ill-posedness by leveraging
an autoencoder and learned codebook of high-quality (HQ) features, achieving
remarkable quality. However, existing approaches in this paradigm frequently
depend on a single encoder pre-trained on HQ data for restoring HQ images,
disregarding the domain gap between LQ and HQ images. As a result, the encoding
of LQ inputs may be insufficient, resulting in suboptimal performance. To
tackle this problem, we propose a novel dual-branch framework named DAEFR. Our
method introduces an auxiliary LQ branch that extracts crucial information from
the LQ inputs. Additionally, we incorporate association training to promote
effective synergy between the two branches, enhancing code prediction and
output quality. We evaluate the effectiveness of DAEFR on both synthetic and
real-world datasets, demonstrating its superior performance in restoring facial
details.Comment: Technical Repor
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models
This paper proposes a method for generating images of customized objects
specified by users. The method is based on a general framework that bypasses
the lengthy optimization required by previous approaches, which often employ a
per-object optimization paradigm. Our framework adopts an encoder to capture
high-level identifiable semantics of objects, producing an object-specific
embedding with only a single feed-forward pass. The acquired object embedding
is then passed to a text-to-image synthesis model for subsequent generation. To
effectively blend a object-aware embedding space into a well developed
text-to-image model under the same generation context, we investigate different
network designs and training strategies, and propose a simple yet effective
regularized joint training scheme with an object identity preservation loss.
Additionally, we propose a caption generation scheme that become a critical
piece in fostering object specific embedding faithfully reflected into the
generation process, while keeping control and editing abilities. Once trained,
the network is able to produce diverse content and styles, conditioned on both
texts and objects. We demonstrate through experiments that our proposed method
is able to synthesize images with compelling output quality, appearance
diversity, and object fidelity, without the need of test-time optimization.
Systematic studies are also conducted to analyze our models, providing insights
for future work
- …